{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
   "metadata": {},
   "source": [
    "# Pidgin English Technical Assistant\n",
    "\n",
    "To demonstrate my familiarity with Open AI, I built a tool that explains technical concepts using Nigerian Pidgin English. It is a part of the Week One Exercise for the LLM Engineering Course"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "c1070317-3ed9-4659-abe3-828943230e03",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Markdown, display, update_display\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# constants\n",
    "\n",
    "MODEL_GPT = 'gpt-4o-mini'\n",
    "MODEL_LLAMA = 'llama3.2'\n",
    "OLLAMA_BASE_URL = \"http://localhost:11434/v1\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "API key found and looks good so far!\n"
     ]
    }
   ],
   "source": [
    "# set up environment\n",
    "load_dotenv(override=True)\n",
    "api_key=os.getenv('OPENAI_API_KEY')\n",
    "\n",
    "if not api_key:\n",
    "    print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
    "elif not api_key.startswith(\"sk-proj-\"):\n",
    "    print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
    "elif api_key.strip() != api_key:\n",
    "    print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
    "else:\n",
    "    print(\"API key found and looks good so far!\")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "eae442e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# initializing  models\n",
    "\n",
    "openai = OpenAI()\n",
    "ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
   "metadata": {},
   "outputs": [],
   "source": [
    "# here is the question; type over this to ask something new\n",
    "system_prompt = \"\"\"\n",
    "You are a technical assistant. You are to expect technical questions. You are to respond with detailed explanations to the technical question. You are to respond in Nigerian Pidgin English\n",
    "\"\"\"\n",
    "\n",
    "question = \"\"\"\n",
    "Please explain what tokens are in LLM\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "messages = [\n",
    "    {\"role\": \"system\", \"content\": system_prompt},\n",
    "    {\"role\": \"user\", \"content\": question},\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de89a835",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Modular  stream results function. Accepts any model and version and parses the response into an output stream\n",
    "\n",
    "def stream_results(model,version):\n",
    "    stream = model.chat.completions.create(\n",
    "        model=version, messages=messages, stream=True\n",
    "    )\n",
    "    \n",
    "    response = \"\"\n",
    "    display_handle = display(Markdown(\"\"), display_id=True)\n",
    "    for chunk in stream:\n",
    "        response += chunk.choices[0].delta.content or ''\n",
    "        update_display(Markdown(response), display_id=display_handle.display_id)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Tokens no be anything wey fit represent small pieces of data wey language models like LLM (Large Language Models) fit understand. If you sabi say, language no be just about words; e involve characters, punctuation, and sometimes even spaces.\n",
       "\n",
       "For example, if you write \"I go market,\" the model go break am down into tokens. Each word fit be one token, but sometimes e fit break words down further depending on how e fit fit understand am. So, \"market\" fit be one token, \"I\" fit be another token, and \"go\" fit still be another token.\n",
       "\n",
       "Reason why tokens dey important be say, LLM dem dey analyze texts using these tokens. The model dey learn the relationship wey dey between tokens and wetin dey follow them. For instance, if you dey write \"I go,\" the model go sabi say e fit see \"market\" after \"go.\" \n",
       "\n",
       "Another thing wey dey happen be say, when you dey use LLM for text generation or any other language task, you go dey interact with the model in tokens. So, everybody dey use tokens to set the ground rules for how the model go respond.\n",
       "\n",
       "For different languages, the way wey tokens dem dey form fit differ. Some languages dey use long words wey fit don be many characters, while others dey use small words. \n",
       "\n",
       "In summary, tokens na the building blocks wey LLM dey use to represent text, and understanding how tokens work fit help you to better use language models for whatever purpose wey you get."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get gpt-4o-mini to answer, with streaming\n",
    "stream_results(openai,MODEL_GPT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Master, I go explain token tins LLM dey use.\n",
       "\n",
       "In Larger Language Model (LLM), \"token\" be small unit of text data wein input and output. Like say we got big sentence for you to input, LLM chop am into tiny pieces called tokens.\n",
       "\n",
       "Imagine you go tell LLM one sentence: 'I love eating jollof rice.' \n",
       "\n",
       "LLM chop dis sentence into tokens like dis:\n",
       "\n",
       "1. [Token 1] I\n",
       "2. [Token 2] love\n",
       "3. [Token 3] eating\n",
       "4. [Token 4] jollof\n",
       "5. [Token 5] rice\n",
       "\n",
       "Dis many small pieces we call token. Token be single unit of text data, and dem fit enter model for processing.\n",
       "\n",
       "Dese tokens get different types depending on where dey dey used:\n",
       "\n",
       "1. **Wordpieces**: Dey be small subset of words that make one word.\n",
       "   For example: 'love' get chopped into smaller token like '[Token 2] love-'\n",
       "  \n",
       "2. **Subword**: Dey be very small piece or fragment of a word.\n",
       "\n",
       "3. **Special tokens**: They be ushin tokens for input (usually \"[CLS]\") and output (usually \"[SEP\"]).\n",
       "\n",
       "Dese special tokens help LLM no get confused where dem dey exactly begin or end in certain context.\n",
       "\n",
       "When you want to ask me one question, whether e be tech ques or anyin, I fit answer am well."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get Llama 3.2 to answer\n",
    "stream_results(ollama,MODEL_LLAMA)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
